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Abstract — We consider network coding for networks experienc- 
ing worst-case bit-flip errors, and argue that this is a reasonable 
model for highly dynamic wireless network transmissions. We 
demonstrate that in this setup prior network error-correcting 
schemes ( flOl . till ) can be arbitrarily far from achieving the 
optimal network throughput. We propose a new metric for errors 
under this model. Using this metric, we prove a new Hamming- 
type upper bound on the network capacity. We also show a 
commensurate lower bound based on GV-type codes that can be 
used for error-correction. The codes used to attain the lower 
bound are non-coherent (do not require prior knowledge of 
network topology). The end-to-end nature of our design enables 
our codes to be overlaid on classical distributed random linear 
network codes |6]. Further, we free internal nodes from having 
to implement potentially computationally intensive link-by-link 
error-correction. 

I. Introduction 

A source Alice wishes to transmit information to a receiver 
Bob over a network with "noisy" links. Such a communication 
problem faces several challenges. 

The primary challenge we consider is that in highly dynamic 
(wireless) environments the noise levels on each link might 
vary significantly across time, and hence be hard to estimate 
well. This issue of variable noise levels exacerbates at least 
two other challenges that had been considered settled by prior 
work. 

One, since noise exists in the network, network coding 
might be dangerous. This is because all nodes mix information, 
so even a small number of bit-flips in transmitted packets 
may end up corrupting all the information flowing in the 
network, causing decoding errors. Prior designs for network 
error-correcting codes exist (for e.g. iflOl . ifTTl ) but as we shall 
see they are ineffective against bit-flips in a highly dynamic 
noise setting. In particular, one line of work (e.g. 0, (8), 
fTTh treats even a single bit-flip in a packet as corresponding 
to the entire packet being corrupted, and hence results in rates 
that are too pessimistic - the fundamental problem is that the 
codes are defined over "large alphabets", and hence are poor 
at dealing with bit-flip errors. Another line of work (e.g. IflOl ) 
overlays network coding on link-by-link error correction, but 
requires accurate foreknowledge of the noise levels on each 
link to have good performance. 

Two, in dynamic settings, the coding operations of nodes 
in the network may be unknown a priori. Under the bit-flip 
error-model we consider, however, the "transform-estimation" 
strategy of Ho et al. |6| does not work, since any headers 
pre-specified for this use can also end up being corrupted. 

In this work we consider simultaneously the reliability 
and universality issues for random linear network coding. 



Namely, we design end-to-end distributed schemes that allow 
reliable network communications in the presence of "worst- 
case" network noise, wherein the erroneous bits can be arbi- 
trarily distributed in different network packets with only the 
constraint that the total number of bit-flips is bounded from 
above. Internal network nodes just do linear network coding. 
Error-correction is only carried out at the receiver(s), which 
also estimates the linear transform imposed on the source's 
data by the networlfl 

As noted above, our codes are robust to a wide variety of 
channel conditions - whether the noise bit-flips are evenly dis- 
tributed among all packets, or even adversarially concentrated 
among just a few packets, our codes can detect and correct 
errors up to a network-wide bound on the total number of 
errors. Naive implementations of prior codes (for instance, of 
link-by-link error-correcting codes iflOlD that try to correct for 
worst-case network conditions may result in network codes 
with much lower rates (see the example in Section lH-Cl below). 
Thus the naturally occurring diversity of network conditions 
works in our favour rather than against us. 

Also, even though our codes correct binary errors rather 
than errors over larger symbol fields as in prior work, the end- 
to-end nature of our design enables our codes to be overlaid on 
classical linear network codes over finite fields (for instance, 
the random linear network codes of Ho et al 0). Further, 
we free internal nodes from having to implement potentially 
computationally intensive link-by-link error-correction. 

The main tool used to prove our results is a transform 
metric that may be of independent interest. It is structurally 
similar to the rank-metric used by Silva et al. IfTTl . but 
has important differences that give our codes the power of 
universal robustness against binary noise (as opposed to the 
packet-based noise considered in IfTTl . and IflOl ). 

II. Model 

A. Network model 

We model our network by a directed acyclic multigraplH, 
denoted by Q = (V, £), where V denotes the set of nodes and 
£ denotes the set of edges. A single source node s e V and 
a set of sinks T C V are pre-specified in V. We denote \£\ 
and |T|, respectively the number of edges and sinks in the 

'As is common in coding theory, the upper and lower bounds on error- 
correction we prove also directly lead to corresponding bounds on error- 
detection - for brevity we omit discussing error-detection in this work. 

2 Our model also allows non-interfering broadcast links in a wireless 
network to be modeled via a directed hypergraph - for ease of notation we 
restrict ourselves to just graphs. 



network, by E and S. A directed edge e leading from node u 
to node v can be represented by the vector (u, v), where u is 
called the tail of e and v is called the head of e. In this case 
e is called an outgoing edge of u and an incoming edge of v. 

The capacity of each edge is one packet - an length-n 
vector over a finite field F 2 ™ - here n and m are design 
parameters to be specified later. Multiple edges between two 
nodes are allowed - this allows us to model links with different 
capacities!! As defined in (T), the network (multicast) capacity, 
denoted C, is the minimum over all sinks t G T of the mincut 
of Q from the source s to the sink t. Without loss of generality, 
we assume there are C edges outgoing from s and incoming 
edges to t for all sinks t G T. 

B. Code model 

The source node s wants to multicast a message M to each 
sinks t G T. To simplify notation, we consider henceforth just 
a single sink - our analysis can be directly extended to the 
multi-sink case. All logarithms in this work are to the base 
2, and we use H(p) to denote the binary entropy function 
-plogp- (1 -p) log(l -p). 

Random linear network coding: All internal nodes in the 
network perform random linear network coding J6) over a 
finite field F2"». Specifically, each internal node takes uni- 
formly random linear combinations of each incoming packet 
to generate outgoing packets. That is, let e' and e index 
incoming and outgoing edges from a node v. The linear coding 
coefficient from e' to e is denoted by f e , e 6 W q . Let Y e 
denote the packet (length-n vector over F 2 m) transmitted on 
the edge e. Then Y e = /e',eY e >, where the summation is 
over all edges e' incoming to the node v, and all arithmetic is 
performed over the finite field F 2 ™ . 

Mapping between F 2 and F 2 >«: The noise considered in 
this work is binary in nature. Hence, to preserve the linear 
relationships between inputs and outputs of the network, we 
use the mappings given in Lemma 1 in Q. These map addition 
and multiplication over F 2 m to corresponding (vector/matrix) 
operations over F 2 . More specifically, a bijection is defined 
from each symbol (from F 2 ™) of each packet transmitted on 
each edge, to a corresponding lengthen bit-vector. For ease of 
notation henceforth, for each edge e and each i G {1, . . . , n}, 
we use Y e and Y e (i) solely to denote respectively the length- 
nm and length-m binary vectors resulting from the bijection 
operating on packets and their ith symbols, rather than the 
original analogues over F 2 ™ traversing that edge e. Separately, 
each linear coding coefficient f e i >e at each node is mapped 
via a homomorphism to a specific mxm binary matrix F e /, e . 
The linear mixing at each node is then taken over the binary 
field - each length-m binary vector Y e /(i) (corresponding to 

3 By appropriate buffering and splitting edges into multiple edges, any 
network can be approximated into such a network with unit capacity edges. 

4 In cases where the number of outgoing edges from s (or the number of 
incoming edges to t) is not C, we can add a source super-node (or sink super- 
node) with C noiseless edges connecting to the original source (or sink) of 
the network. The change in the number of edges and probability of error on 
each edge are small compared to those of the original network, so our analysis 
essentially still applies. 



the binary mapping of the ith symbol of the packet Y e ' over 
the field F 2 ™) equals ^ F e ^ e Y e i (i). It is shown in J7) that 
an isomorphism exists between the binary linear operations 
defined above, and the original linear network code. In what 
follows, depending on the context, we use the homomorphism 
to switch between the scalar (over F 2 m ) and matrix (over F 2 ) 
forms of the network codes' linear coding coefficients, and 
the isomorphism to switch between the scalar (over F 2 ™) and 
vector (over F 2 ) forms of each symbol in each packet. 
Noise: We consider "worst-case noise" in this work, wherein 
an arbitrary number of bit-flips can happen in any transmitted 
packet, subject to the constraint that no more that a fraction of 
p bits over all transmitted packets are flipped. The noise matrix 
Z is an Em x n binary matrix with at most pEran nonzero 
entries which can be arbitrarily distributed. In particular, the 
m(i — 1) + 1 through the mi rows of Z represent the bit 
flips in the ith packet Y Ci transmitted over the network. If the 
(km + j)th bit of the length-m?i binary vector is flipped (that 
is, the jth bit of the fcth symbol over F 2 ™ in Y ei is flipped), 
then the (m(i — 1) + j, k) bit in Z equals 1, else it equals 0. 
Thus the noise matrix Z represents the noise pattern of the 
network. To model the noise as part of the linear transform 
imposed by the network, we add an artificial super-node s' 
connected to all the edges in the network, injecting noise into 
each packet transmitted on each edge in the network according 
to entries of the noise matrix Z. 

Source: The source has a set of 2 Rmn messages {M} it 
wishes to communicate to each sink, where R is the rate of 
the source. Corresponding to each message M it generates a 
codeword X(M) using the encoders specified in Section ITV-BI 
(to make notation easier we usually do not explicitly reference 
the parameter M and instead refer simply to X). This X is 
represented by a C x n matrix over F 2 ™, or alternatively a 
Cm x n matrix over F 2 . Each row of this matrix corresponds 
to a packet transmitted over a distinct edge leaving the source. 
Receiver(s): Each sink t receives a batch of C packets. 
Similarly to the source, it organizes the received packets into a 
matrix Y, which can be equivalently viewed as a C x n matrix 
over F 2 ™ or a Cm x n binary matrix. Each sink t decodes the 
message M from the received matrix Y using the decoders 
specified in Section IIV-BI 

Transfer matrix and Impulse response matrix: Having 
defined the linear coding coefficients of internal nodes, the 
packets transmitted on the incoming edges of each sink t 
can inductively be calculated as linear combinations of the 
packets on the outgoing edges of s. We denote the C x C 
transfer matrix from the outgoing edges of s to the incoming 
edges oft by T, over the finite field F 2 ™. Alternatively, using 
the homomorphism described above, T may be viewed as as 
Cm x Cm binary matrix. 

We similarly define T to be the impulse response matrix, 
which is the transfer matrix from a imaginary source s'-who 
injects errors into all edges-to the sink t. Note that T is a 
sub-matrix of T, composed specifically of the C columns of 
T corresponding to the C outgoing edges of s. 

In this work we require that every C x C sub-matrix 



of t is invertible. As noted in, for instance, (9), O this 
happens with high probability for random linear network 
codes. Alternatively, deterministic designs of network error- 
correcting codes 12 also have this property. 

Using the above definitions the network can thus be ab- 
stracted by the equation (Q]) below as a worst-case binary-error 
network channel. 

Y = TX + TZ. (1) 

Similar equations have been considered before (for instance 
in 0, [81, HH) - the key difference in this work is that we are 
interested in Z matrices which are fundamentally best defined 
over the binary field, and hence, when needed, transform the 
other matrices in (Q~|i also into binary matrices. 
Performance of code: The source encoders and channel 
decoders specified in Section IIV-BI together comprise worst- 
case binary-error-correcting network codes. A good worst- 
case binary-error-correction network channel code has the 
property that, for all messages M, and noise patterns Z with 
at most pEmn bit-flips, M = M. A rate R is said to be 
achievable for the worst-case binary-error channel if, for all 
sufficiently large n, there exists a good code with rate R. 

C. Toy Example 

We demonstrate via an example that in networks with 
worst-case bit-errors, prior schemes have inferior performance 
compared to our scheme. In Figure III-CI the network has C 
paths (with a total of 2C links that might experience worst- 
case bit-flip errors). 

Benchmark 1: If link-by-link error-correctior@ is applied as 
in iflOl . every link is then required to be able to correct 
2Cpn worst-case bit-flip errors (since all the bit-errors may be 
concentrated in any single link). Using GV codes (B, lfl2l ) a 
rate of 1 — H(ACp) is achievable on each link, and hence the 
overall rate scales as C(l— H(ACp)). As C increases without 
bound, the throughput thus actually goes to zero. The primary 
reason is that every link has to prepare for the worst case 
number of bit-flips aggregated over the entire network, but in 
large networks, the total number of bit-flips in the worst-case 
might be too much for any single link to be able to tolerate. 
Benchmark 2: Consider now a more sophisticated scheme, 
combining link-by-link error correction with end-to-end error- 
correction as in ifTTI . Suppose each link can correct 2n ^ p 
worst-case bit-flips, where A: is a parameter to be determined 
such that the rate is optimized. Then at most k links will 
fail. Overlaying an end-to-end network error-correcting code 
as in IfTTI with link-by-link error-correcting codes such as 
GV codes (effectively leading to a concatenation-type scheme) 
leads to an overall rate of (C - 2k)(l - H{^)). For large 
C, this is better than the previous benchmark scheme since 
interior nodes no longer attempt to correct all worst-case 
errors and hence can operate at higher rates - the end-to- 
end code corrects the errors on those links that do experience 

5 Since interior nodes might perform network coding, naive implementations 
of end-to-end error-correcting codes are not straightforward - indeed - that 
is the primary goal of our constructions. 




Fig. 1. A network with C parallel paths from the source to the destination. 
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Fig. 2. Transform metric: the minimal number of columns of B that need 
to be added to M\(i) to obtain M2(i) is S(i). 

errors. Nonetheless, as we observe below, our scheme still 
outperforms this scheme, since concatenation-type schemes in 
general have lower rates than single-layer schemes. 
Our scheme: The rate achieved by our scheme (as demon- 
strated in Section HV-Bb is at least C(l - 2H(2p)). As can 
be verified, this rate is higher than either of the benchmark 
schemes. 

III. Transform Metric 

We first define a "natural" distance function between binary 
matrices Mi and M 2 related as Mi = M2 + BZ for some 
matrices B and Z. 

Let Mi and M2 be arbitrary a x b binary matrices. Let B 
be a given axe matrix with full column rank. Let Mi(i) and 
M 2 (i) denote respectively the ith columns of Mi and M 2 . We 
define ds(Mi, M 2 ), the transform distance between Mi and 
M 2 in terms of B, as follows. 

Definition 1. Let 6(i) denote the minimal number of columns 
of B that need to be added to Mi(i) to obtain M 2 (i). Then 
the transform distance ds(Mi, M 2 ) equals Y^i=i ^(*)- 

IV. Main results 

In this section we present our main results. In Theorem Q] 
in Subsection IIV-AI we present an upper bound on the rates 
of communication achievable by any code over networks that 
have "worst-case" bit-flip errors. Our bounding technique is 
motivated by the corresponding Hamming bound technique in 
classical coding theory J5) - the main challenge lies in deriving 
good lower bounds for the "volumes of spheres" in the channel 
model and corresponding metric defined in Section [HI] 



In Subsection II V-BI we discuss schemes that achieve "good" 
rates of communication over networks that have "worst-case" 
bit-flip errors. We present three schemes motivated by the 
well-known Gilbert- Varshamov (GV) bound from classical 
coding theory H, CPU - again, the challenge lies in deriving 
good upper bounds on the volumes of spheres in the metric 
we define. Theorem [2] considers the coherent scenario, i.e., 
when the linear coding coefficients in the network (or at least 
the transfer matrix T and the impulse response matrix T) 
are known in advance to the receiver. We use this setting 
primarily for exposition, since the proof is somewhat simpler 
than the proof for the non-coherent setting, when no advance 
information about the topology of the network, the linear 
coding coefficients used, or T or T is known in advance to 
the receiver. In Theorem [3] we are able to demonstrate that 
essentially the same rates as in Theorem|2]are still achievable, 
albeit with an rate-loss that is asymptotically negligible in the 
block-length n. 

As we see below, the functional forms of both the 
Hamming-type upper bounds and the GV-type lower bounds 
we derive are structurally very similar to those of the classical 
Hamming and GV bounds. 

A. Hamming-type bound 

Theorem 1. For all p less than C'/(2Em) an upper bound on 
the achievable rate of any code over the worst-case binary- 
error channel is 1 — H(p)(^). 

Proof: Since each transmitted codeword X is a Cm x n 
binary matrix, the number of possible choices of X is at most 
2 Cmn . But suppose X is transmitted, by the definitions of the 
worst-case bit-error channel, the received Y lies in the radius- 
pEmn ball (in the transform metric) Bf(TX,pEmn) defined 
as {Y \df(TX, Y) < pEmn). For the message corresponding 
to X to be uniquely decodable, it is necessary that the balls 
Bf(TX,pEmn) be non-intersecting for each X chosen to be 
in the codebook. Hence to get an upper bound on the number 
of codewords that can be chosen, we need to derive a lower 
bound of the volume of Bf(TX,pEmn). Recall that Y equals 
TX + TZ. Hence we need to bound from below the number 
of distinct values of TZ for Z with at most pEmn ones. 

We consider the case that Z has exactly pEmn ones that are 
equally distributed among columns of Z - hence every column 
of Z has pEm ones in it. We now show that, in the worst case, 
every such distinct matrix Z results in distinct TZ. Suppose 
not - in that case there exist distinct Z and Z 1 with pEm ones 
in each columns of both matrix such that TZ equals TZ 1 , i.e., 
T(Z — Z') equals the zero matrix. In particular, for at least 
some column of Z and Z', say Z(i) and Z'(i), it must be the 
case that T(Z(i) — Z'(i)) equals 0. But by assumption each 
column of both Z and Z' has less that pEm < C/2 ones, and 
hence Z(i) — Z'(i) has less than C ones in it. 

We now view T and Z as matrices over ¥2™. From the 
argument above, Z(i) — Z'(i) has less than C non-zero 
elements over ¥2^ in it (since an element over ¥2™ is zero 
if and only if each of the m bits in its binary representation 



is zero). Hence T(Z(i) — Z 1 (if) is a linear combination over 
of strictly less than C columns of T. But as to the matrix 
T viewed over ¥2™, since we are deriving a worst-case upper 
bound, we can also require that every CxC sub-matrix of T is 
invertible (as noted before this happens with high probability 
for random linear network codes). Hence T(Z(i) — Z'(i)) 
cannot equal the zero vector, which leads to a contradiction. 

Hence the number of distinct values for TZ is at least the 
number of distinct values for Z with at most pEm ones in each 
column. This equals is at least (J^J , which by Stirling's 
approximation Q is at least 2 H ^ Emn ~ lo s( Em + 1 ) . The total 
number of Cm x n binary matrices is 2 Cmn . Thus an upper 
bound on the size of any codebook for the worst-case binary- 
error channel is 



E , log(Em+l) 



)Cmn 



oEmnH(p)-log(Em+l 

which, asymptotically in n, gives the Hamming-type upper 
bound on the rate of any code as 1 - H(p)§. □ 

B. Gilbert-Varshamov-type bounds 

1) Coherent GV-type network codes: We first discuss the 
case when the network transfer matrix T and impulse response 
matrix T are known in advance. 

Codebook design: Initialize the set S as the set of all binary 
Cm x n matrices. Choose a uniformly random Cm x n 
binary matrix X as the first codeword. Eliminate from S all 
matrices in the radius-2pEmn ball (in the transform metric) 
Bf(TX,2pEmn). Then choose a matrix Y' uniformly at 
random in the remaining set and choose X' = T~ Y Y' as 
the second codeword. Now, further eliminate all matrices in 
the radius-2p £/rrm ball Bf(TX',2pEmn) from S, choose 
a random Y' from the remaining set, and choose the third 
codeword X" as X" = T~ 1 Y". Repeat this procedure until 
the set S is empty. 

Theorem 2. Coherent GV-type network codes achieve a rate 
of at least 1 - H(2p)^. 

Proof: For this theorem, we need an upper bound 
on Bf(TX,2pEmn) (rather than a lower bound on 
B f (TX,pEmn) as in Theorem[T). Recall that Y = TX + TZ 
The number of different Y, or equivalently, different TZ, can 
be bounded from above by the number of different Z. This 

2pEmn , ^ 

equals I J . The dominant term this summation 

i=Q ^ * ' 

is when i equals 2pEmn. Hence the summation can be 
bounded from above by (2pEmn + ty^pEmn)' By Stirling's 
approximation (5) this is at most (2pEmn + i)2 ff ( 2 P)- Bmn . 

Thus a lower bound on the size of the codebook for coherent 
GV-type 
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= 2 (l-g(2p)g- los(2 ^"" 1+1) )Cm« 



which, asymptotically in n, gives the rate of coherent GV-type 
bound network codes as 1 — H(2p)^. □ 



2) Non-coherent GV-type network codes: The assumption 
that T and T are known in advance to the receiver is often 
unrealistic, since the random linear coding coefficients in the 
network are usually chosen on the fly. Hence we now consider 
the non-coherent setting, wherein T and T are not known in 
advance. We demonstrate that despite this lack of information 
the same rates as in Theorem [2] are achievable in the non- 
coherent setting. 

The number of all possible T is at most by 2 CErn since T 
is a C x E matrix - the crucial observation is that this 
number is independent of the block-length n. Hence in the 
non-coherent GV setting, we consider all possible values of 
T (and hence T, since it comprises of a specific subset of C 
columns of T). 

Codebook design: Initialize the set S as the set of all binary 
Cm x n matrices. Choose a uniformly random Cm x n binary 
matrix X as the first codeword. For each CxE matrix T (over 
the field F2m), eliminate from S all matrices in the radius- 
2pEmn ball (in the transform metric) Bf(TX, 2pEmn). Then 
choose a matrix Y 1 uniformly at random in the remaining 
set and choose X 1 = T~ l Y' as the second codeword. 
Now, further eliminate all matrices in the radius-2pEmn 
ball Bf(TX',2pEmn) from S, choose a random Y' from 
the remaining set, and choose the third codeword X" as 
X" = T~ l Y" . Repeat this procedure until the set S is empty. 

Theorem 3. Non-coherent GV-type network codes achieve a 
rate of at least 1 — H(2p)^. 

Proof: The crucial difference with the proof of Theo- 
rem |2] is in the process of choosing codewords - at 
each stage of the codeword elimination process, at most 
2 m \Bf(TX', 2pEmn)\ potential codewords are eliminated 
(rather than \Bf(TX',2pEmn)\ potential codewords as in 
Theorem |2j. Hence the number of potential codewords that 
can be chosen in the codebook is at least 

2 C Em (2pEmn + 1)2"^) Emn 

which equals 

2 (1-H(2p)f - (l°gppEmn+l)+ E ) )Cmn 

As can be verified, asymptotically in n this leads to the same 
rate of 1 — H(2p)^ as in Theorem |2] □ 
Note: Our proposed codes via concatenation schemes so 
that their encoding and decoding complexity grows only poly- 
nomially in the block-length (albeit exponentially in network 
parameters). 

C. Scale of Parameters 

We now investigate the regime of p wherein our results are 
meaningful. 

Claim 1. For all p less than vnin{-^—, 2 m+i ) tne Hamming- 
type bounds and GV-type hold. 

Proof: The Hamming-type bound in Theorem Q] requires 

pEm < y. 



For the GV-type bound in Theorems |2] and [3] to give non- 
negative rates, H{2p)^ < 1, Hence when p is very small, 

H(2p)|^2p(logl/(2p))| (2) 

<^(lo g l/(2^ = ^ (3) 
Em C m 

< 1 (4) 

where (O follows from the limiting behaviour of the binary 
entropy function for small p, (O is because p < C/(2Em) 
(our first condition), and is because p < 2~' m+1 ) (our 
second condition). 

V. Conclusion 

In this work we investigate upper and lower bounds for the 
performance of end-to-end error-correcting codes for worst- 
case binary errors. This model is appropriate for highly dy- 
namic wireless networks, wherein the noise-levels on individ- 
ual links might be hard to accurately estimate. We demonstrate 
significantly better performance for our proposed schemes, 
compared to prior benchmark schemes. 
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